Goto

Collaborating Authors

 classification technique


Complex Networks for Pattern-Based Data Classification

Chire, Josimar, Mahmood, Khalid, Liang, Zhao

arXiv.org Artificial Intelligence

Data classification techniques partition the data or feature space into smaller sub-spaces, each corresponding to a specific class. To classify into subspaces, physical features e.g., distance and distributions are utilized. This approach is challenging for the characterization of complex patterns that are embedded in the dataset. However, complex networks remain a powerful technique for capturing internal relationships and class structures, enabling High-Level Classification. Although several complex network-based classification techniques have been proposed, high-level classification by leveraging pattern formation to classify data has not been utilized. In this work, we present two network-based classification techniques utilizing unique measures derived from the Minimum Spanning Tree and Single Source Shortest Path. These network measures are evaluated from the data patterns represented by the inherent network constructed from each class. We have applied our proposed techniques to several data classification scenarios including synthetic and real-world datasets. Compared to the existing classic high-level and machine-learning classification techniques, we have observed promising numerical results for our proposed approaches. Furthermore, the proposed models demonstrate the following distinguished features in comparison to the previous high-level classification techniques: (1) A single network measure is introduced to characterize the data pattern, eliminating the need to determine weight parameters among network measures. Therefore, the model is largely simplified, while obtaining better classification results. (2) The metrics proposed are sensitive and used for classification with competitive results.


Examining the Rat in the Tunnel: Interpretable Multi-Label Classification of Tor-based Malware

Karunanayake, Ishan, AlSabah, Mashael, Ahmed, Nadeem, Jha, Sanjay

arXiv.org Artificial Intelligence

Despite being the most popular privacy-enhancing network, Tor is increasingly adopted by cybercriminals to obfuscate malicious traffic, hindering the identification of malware-related communications between compromised devices and Command and Control (C&C) servers. This malicious traffic can induce congestion and reduce Tor's performance, while encouraging network administrators to block Tor traffic. Recent research, however, demonstrates the potential for accurately classifying captured Tor traffic as malicious or benign. While existing efforts have addressed malware class identification, their performance remains limited, with micro-average precision and recall values around 70%. Accurately classifying specific malware classes is crucial for effective attack prevention and mitigation. Furthermore, understanding the unique patterns and attack vectors employed by different malware classes helps the development of robust and adaptable defence mechanisms. We utilise a multi-label classification technique based on Message-Passing Neural Networks, demonstrating its superiority over previous approaches such as Binary Relevance, Classifier Chains, and Label Powerset, by achieving micro-average precision (MAP) and recall (MAR) exceeding 90%. Compared to previous work, we significantly improve performance by 19.98%, 10.15%, and 59.21% in MAP, MAR, and Hamming Loss, respectively. Next, we employ Explainable Artificial Intelligence (XAI) techniques to interpret the decision-making process within these models. Finally, we assess the robustness of all techniques by crafting adversarial perturbations capable of manipulating classifier predictions and generating false positives and negatives.


Biomarker based Cancer Classification using an Ensemble with Pre-trained Models

Lee, Chongmin, Kim, Jihie

arXiv.org Machine Learning

Certain cancer types, namely pancreatic cancer is difficult to detect at an early stage; sparking the importance of discovering the causal relationship between biomarkers and cancer to identify cancer efficiently. By allowing for the detection and monitoring of specific biomarkers through a non-invasive method, liquid biopsies enhance the precision and efficacy of medical interventions, advocating the move towards personalized healthcare. Several machine learning algorithms such as Random Forest, SVM are utilized for classification, yet causing inefficiency due to the need for conducting hyperparameter tuning. We leverage a meta-trained Hyperfast model for classifying cancer, accomplishing the highest AUC of 0.9929 and simultaneously achieving robustness especially on highly imbalanced datasets compared to other ML algorithms in several binary classification tasks (e.g. breast invasive carcinoma; BRCA vs. non-BRCA). We also propose a novel ensemble model combining pre-trained Hyperfast model, XGBoost, and LightGBM for multi-class classification tasks, achieving an incremental increase in accuracy (0.9464) while merely using 500 PCA features; distinguishable from previous studies where they used more than 2,000 features for similar results.


Classification of Tabular Data by Text Processing

Ramani, Keshav, Borrajo, Daniel

arXiv.org Artificial Intelligence

Natural Language Processing technology has advanced vastly in the past decade. Text processing has been successfully applied to a wide variety of domains. In this paper, we propose a novel framework, Text Based Classification(TBC), that uses state of the art text processing techniques to solve classification tasks on tabular data. We provide a set of controlled experiments where we present the benefits of using this approach against other classification methods. Experimental results on several data sets also show that this framework achieves comparable performance to that of several state of the art models in accuracy, precision and recall of predicted classes.


Applications of Error Back-Propagation to Phonetic Classification

Neural Information Processing Systems

This paper is concerced with the use of error back-propagation in phonetic classification. Our objective is to investigate the ba(cid:173) sic characteristics of back-propagation, and study how the frame(cid:173) work of multi-layer perceptrons can be exploited in phonetic recog(cid:173) nition. We explore issues such as integration of heterogeneous sources of information, conditioll that can affect performance of phonetic classification, internal representations, comparisons with traditional pattern classification techniques, comparisons of differ(cid:173) ent error metrics, and initialization of the network. Our investiga(cid:173) tion is performed within a set of experiments that attempts to rec(cid:173) ognize the 16 vowels in American English independent of speaker. Our results are comparable to human performance.


Comparison of three classification techniques: CART, C4.5 and Multi-Layer Perceptrons

Neural Information Processing Systems

In this paper, after some introductory remarks into the classification prob(cid:173) lem as considered in various research communities, and some discussions concerning some of the reasons for ascertaining the performances of the three chosen algorithms, viz., CART (Classification and Regression Tree), C4.5 (one of the more recent versions of a popular induction tree tech(cid:173) nique known as ID3), and a multi-layer perceptron (MLP), it is proposed to compare the performances of these algorithms under two criteria: classi(cid:173) fication and generalisation. It is found that, in general, the MLP has better classification and generalisation accuracies compared with the other two algorithms.


Prediction of Customer Churn in Banking Industry

Charandabi, Sina Esmaeilpour

arXiv.org Artificial Intelligence

With the growing competition in banking industry, banks are required to follow customer retention strategies while they are trying to increase their market share by acquiring new customers. This study compares the performance of six supervised classification techniques to suggest an efficient model to predict customer churn in banking industry, given 10 demographic and personal attributes from 10000 customers of European banks. The effect of feature selection, class imbalance, and outliers will be discussed for ANN and random forest as the two competing models. As shown, unlike random forest, ANN does not reveal any serious concern regarding overfitting and is also robust to noise. Therefore, ANN structure with five nodes in a single hidden layer is recognized as the best performing classifier.


DOC-NAD: A Hybrid Deep One-class Classifier for Network Anomaly Detection

Sarhan, Mohanad, Kulatilleke, Gayan, Lo, Wai Weng, Layeghy, Siamak, Portmann, Marius

arXiv.org Artificial Intelligence

Machine Learning (ML) approaches have been used to enhance the detection capabilities of Network Intrusion Detection Systems (NIDSs). Recent work has achieved near-perfect performance by following binary- and multi-class network anomaly detection tasks. Such systems depend on the availability of both (benign and malicious) network data classes during the training phase. However, attack data samples are often challenging to collect in most organisations due to security controls preventing the penetration of known malicious traffic to their networks. Therefore, this paper proposes a Deep One-Class (DOC) classifier for network intrusion detection by only training on benign network data samples. The novel one-class classification architecture consists of a histogram-based deep feed-forward classifier to extract useful network data features and use efficient outlier detection. The DOC classifier has been extensively evaluated using two benchmark NIDS datasets. The results demonstrate its superiority over current state-of-the-art one-class classifiers in terms of detection and false positive rates.


Review on Classification Techniques used in Biophysiological Stress Monitoring

Iqbal, Talha, Elahi, Adnan, Shahzad, Atif, Wijns, William

arXiv.org Artificial Intelligence

Cardiovascular activities are directly related to the response of a body in a stressed condition. Stress, based on its intensity, can be divided into two types i.e. Acute stress (short-term stress) and Chronic stress (long-term stress). Repeated acute stress and continuous chronic stress may play a vital role in inflammation in the circulatory system and thus leads to a heart attack or to a stroke. In this study, we have reviewed commonly used machine learning classification techniques applied to different stress-indicating parameters used in stress monitoring devices. These parameters include Photoplethysmograph (PPG), Electrocardiographs (ECG), Electromyograph (EMG), Galvanic Skin Response (GSR), Heart Rate Variation (HRV), skin temperature, respiratory rate, Electroencephalograph (EEG) and salivary cortisol, used in stress monitoring devices. This study also provides a discussion on choosing a classifier, which depends upon a number of factors other than accuracy, like the number of subjects involved in an experiment, type of signals processing and computational limitations.


[100%OFF] Logistic Regression In Python

#artificialintelligence

You're looking for a complete Classification modeling course that teaches you everything you need to create a Classification model in Python, right? You've found the right Classification modeling course! How this course will help you? A Verifiable Certificate of Completion is presented to all students who undertake this Machine learning basics course. Why should you choose this course?